Customer Cohort Analysis¶

Cohort analysis is a useful technique to understand customer behavior over time. It involves grouping customers into cohorts based on their first purchase date and then analyzing their behavior over subsequent periods. Below is a Python script using pandas and matplotlib to perform cohort analysis on the provided dataset.

Step 1: Import Libraries¶

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import calendar
import warnings
import json
import os
from datetime import datetime
from itertools import combinations
from collections import Counter

Step 2: Load the Dataset¶

In [3]:
import pandas as pd
import warnings

warnings.filterwarnings("ignore")

Customer_data_df = pd.read_csv(r"C:\Users\jki\Desktop\Data Scence Projects\Customer Segmentaion Cohort Analysis\Machine Learnign Project\Source Data\customer orders.csv", encoding="ISO-8859-1")
Customer_data_df.head(5)
Out[3]:
Id Full Name Address age gender Latitude Longitude Email Adress Order Datetime Order Status Order Total Items Total sales Order Count Rating Average Rating
0 10 Adam Martinez China, Beijing Shi 30 Female 40.0 116.0 adam.martinez@internalmail 9/14/2021 CANCELLED 209 Boy's Coat (Blue) 80 1 8 6
1 20 Adam Miller 193, Bannerghatta Main Rd 68 Female 13.0 78.0 adam.miller@internalmail 9/18/2021 COMPLETE 54 Boy's Coat (Blue) 10,816 90 4 6
2 30 Adam Walker Behrenstraße 42 70 Female 53.0 13.0 adam.walker@internalmail 9/22/2021 COMPLETE 43 Boy's Coat (Blue) 319 3 8 6
3 40 Adan Lamica Behrenstraße 42 72 Female 53.0 13.0 adan.lamica@internalmail 9/26/2021 COMPLETE 305 Boy's Coat (Brown) 137 3 5 6
4 50 Adeline Iannotti Floreasca Park 43 Soseaua 16 Female 44.0 26.0 adeline.iannotti@internalmail 9/30/2021 COMPLETE 153 Boy's Coat (Brown) 3,936 64 6 6

Step 3: Convert Order Datetime to Datetime Format¶

In [4]:
# Convert 'Order Datetime' to datetime format
Customer_data_df['Order Datetime'] = pd.to_datetime(Customer_data_df['Order Datetime'], format='%m/%d/%Y')

Step 4: Extract the Cohort (Month of First Purchase)¶

In [5]:
# Extract the cohort (month of first purchase) for each customer
Customer_data_df['Cohort'] = Customer_data_df.groupby('Email Adress')['Order Datetime'].transform('min').dt.to_period('M')

Step 5: Calculate the Cohort Index (Months Since First Purchase)¶

In [6]:
# Calculate the time offset for each order within the cohort
Customer_data_df['Cohort Index'] = (Customer_data_df['Order Datetime'].dt.to_period('M') - Customer_data_df['Cohort']).apply(lambda x: x.n)

Step 6: Group by Cohort and Cohort Index¶

In [7]:
# Group by Cohort and Cohort Index, then count the number of unique customers
cohort_data = Customer_data_df.groupby(['Cohort', 'Cohort Index'])['Email Adress'].nunique().reset_index()

Step 7: Pivot the Data to Create a Cohort Matrix¶

In [8]:
# Pivot the data to create a cohort matrix
cohort_pivot = cohort_data.pivot(index='Cohort', columns='Cohort Index', values='Email Adress')

Step 8: Visualize the Cohort Analysis¶

In [9]:
# Plot the cohort analysis
plt.figure(figsize=(12, 8))
sns.heatmap(cohort_pivot, annot=True, fmt='.0f', cmap='Blues', linewidths=0.5)
plt.title('Cohort Analysis - Customer Retention')
plt.xlabel('Cohort Index (Months since first purchase)')
plt.ylabel('Cohort (Month of first purchase)')
plt.show()
No description has been provided for this image

The content you provided appears to be a list of cohorts based on the month of the first purchase, primarily focusing on the years 2022 and 2023. Cohort analysis is a method used to track and analyze the behavior of groups of users over time, often to understand customer retention and engagement.

Here’s a breakdown of the key points:

Cohort Identification: The cohorts are identified by the month of the first purchase. For example, "2022-09" refers to customers who made their first purchase in September 2022.

In [ ]: